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Overview 



* Motivation and Objectives 

* NASA’s Information Power Grid 

* Grid Benchmarking 

* Grid Performance Monitoring 

* User-Level Grid Scheduling 

* System-Level Scheduling 
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Motivation and Objectives 


* Large-scale science and engineering accomplished through 
interaction of geographically-dispersed people, heterogeneous 
computing resources, information systems, and instruments 

* Overall goal is to facilitate the routine interactions of these 
resources to reduce NASA mission-critical design cycle time 

* Many facilities around the world are moving toward making 
resources available on a “Grid” (grid computing) 

* The Information Power Grid (IPG) is NASA’s push for a persistent, 
secure, and robust implementation of a Grid 

* Investigate techniques and develop tools to measure and improve 
performance of a broad class of applications when run on a Grid 
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Information Power Grid 


* Involves linking NASA’s vast disperse resources to create an 
intelligent, scalable, adaptive, and transparent computational, 
communication, data analysis, and storage environment 


human collaboration i 


instruments! 


large-scale computing 


| remote sensing 


data exploration \ 


iarchival storage 
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IPG Layered Architecture 
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Grid Benchmarking 
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Deficiencies of current Grid performance measurement technology 
o Simulation tools idealized, unclear Grid model assumptions, static 
(WARMstones, Bricks, MicroGrid) 
o Superposition principle of probes may not hold 
(Globus HBM, NWS, NetLogger) 

Existing techniques useful for 
o Users debugging Grid application performance 
o Developers of Grid and communication software 

But does not provide metric for comparing Grid performance on 
actual distributed applications 

Goal: 

o Determine Grid functionality and application performance objectively 
o Use representative set of distributed applications 
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Grid Benchmark Requirements 


* Tests computational aspects of environment 

* Is representative of scientific computing tasks 

* Uses basic Grid services 

* Is not intrusive (no throughput stress testing) 

* Contains communicating processes 

* Does significant communication 

* Is verifiable (deterministic, not interactively steered) 

* Needs no initialization data files 

* Is fair 


APART-2001 


NAS Grid Benchmarks (NGB) 


* Provide paper-and-pencil specifications of small set of complete but 
representative distributed applications 

* For convenience, also provide reference implementations 
(Globus, Legion, Condor, Java, ksh) 

* Focus on computational aspects of Grids 

o Use mesh-based NAS Parallel Benchmarks (NPB) as building blocks 
(well understood, calibrated, deterministic, portable, allow communication, 
parallel, no input required but output of one can be input for another) 
o MG (multigrid for Poisson eqn): post-processing (data smoother) 

□ FT (spectral method for Laplace eqn): visualization (spectral analysis) 

□ BT (ADI, block tridiagonal): ''s 

□ SP (ADI, scalar pentadiagonal): l Scientific com P utations 

o LU (lower-upper sym Gauss-Seidel): J (^ ow solvers) 
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NGB Construction 


* Construct synthetic Grid applications for scientific computing 

* Data Flow Graph coupling NPB codes 

* Provide wide range of problem sizes (classes): S, A, B, C, ... 

* Benchmarks non-converging, but numerically stable 

* Limit number of verification values 

* Specify abstract services: authenticate, create task, communicate 

* Do not specify mapping, scheduling, fault tolerance, data security 

* Report turnaround time and the resources used 



NGB Data Flow Graphs (Class S) 
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NGB Issues 


* Are proposed Data Flow Graphs representative of scientific apps? 

* What other classes of apps should be used? 

* Is turnaround time the best measure? 

* Do we need to consider a Grid currency (G$)? 

* How to interpret the results? 

o Primitive Grid services (functionality, consistency among runs) 
o Reservation of resources (variation of single resource) 
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Grid Performance Monitoring 
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IPG a large distributed set of resources, services, and applications 

o Will be failures; needs to be monitored 
o Must be managed 

Develop general framework for observation and control 
o Observe and control variety of resources, services, and applications 
o Scalable, secure, and compatible with emerging GGF standards 
o Extensible to observe new events, perform new actions, and manage 

Deficiencies of existing monitors 
a Cannot be embedded in tools or apps (AIMS, Big Brother) 
o Limited fault detection functionality (Globus HBM, NWS) 
o System- or app-specific information, but not both (SNMP-based tools, 
MPICH profiling) 

o Lack of extensible data forwarding and gathering mechanisms (Netlogger) 
o Incompatibility with IPG security and authentication requirements 
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CODE: Control and Observation of / 
Distributed Environments 




Advertise 






o Directory Service contain info about Observers & Actors for Director 
o Sensor Manager manages sensors, subscriptions, queries 
o Actuator Manager handles requests for actions 
o Expert System + User Rules instead of Management Logic in Director 
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CODE Implementation 


* In C++ to be modular and extensible 

* Uses pthreads 

* Communicates using TCP, UDP, or SSL 

* OpenSSL for authentication and security 

* XML encoding of messages 

* Data in Directory Service compatible with LDAP schemas 

* CLIPS expert system available as alternative in Director 

* Initially targeting IRIX, Solaris, Linux 

* Ported Director code to Java for GUI 


APART-2001 


Page 8 




Grid Management System Using CODE 


* Observe and control a Globus-based computational Grid like IPG 

o Becomes difficult as Grids get larger 

* Things to observe 

o Globus Resource Allocation Manager (GRAM) reporter daemons 
o Grid Information Service (GIS) servers 
o Log files 

o Resource status and usage 

* Things to control 

o Restarting GRAM daemons 
o Restarting / configuring GIS servers 
o Add / remove user mapping 
o Send appropriate e-mail 

* Provide a GUI 
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Grid Control System Using CODE 


rkstation 
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User-Level Grid Scheduling 


* Grids have lots of different computers 

* Where should a user submit his application? 

o Which machines can user access? 
o Which machines have sufficient resources? 
o How much do machines cost to use? 
o When will the application finish? 

□ Time to pre-stage input files 

□ Time waiting in scheduler queue 

□ Time to execute 

□ Time to post-stage output files 

* Currently ignore time to stage files 
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Approach 


* Develop execution time prediction technique 

o Instance-based learning using historical information 

* Develop queue wait time prediction technique 

o Simulate scheduling algorithms 
o Use execution time predictions 

* Add the two predicted times to obtain application turnaround time 

* Select resources with minimum turnaround time 
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Instance-Based Learning 


* Aka: locally-weighted learning, memory-based learning, lazy 
learning 

* Maintain a database of experiences 

o Each experience has set of input and output features 

* Calculate an estimate for a query using relevant experiences 

o Relevance measured with a distance function 
o Calculation can be an average, distance weighted average, locally 
weighted regression 

o Use only nearest experiences (nearest neighbors) or all experiences 

* Local learning: not one equation that fits all data points 

* No learning phase as in neural networks 
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Distance Functions 


Minkowski D{x,y)= j 

/ ) i — j , 

o Manhattan jD(*-} ; ) = 2| x / - ' V /| ° Euclidean D(x,y)= j^{ x / ~y ] , 


Only works where features are linear 


Heterogeneous Euclidean Overlap metric 

o Handles features that are linear or nominal 

1, if Xj or is unknown, ^ 

d r (x,y) = < overlap f (x, y\ if / is nominal, overlap f (x, y) = \ f 

' J [1, otherwise 

rn __ diff f (x, y\ otherwise , 
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Feature Scaling 
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* Warp input space by scaling features in distance function 

D ( x > y) = y j'E w / d /( x >yY 

* Larger weight implies more relevant feature 


W l ~ i» w 2 ~ i 



Wj = 2, w 2 - 1 



^ = 4,^ = 4 


^ = 4,^=8 
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Kernel Regression 


* Estimate is distance weighted average of experiences 

* Weighting also called kernel function 

^K(D(q,e))V f (e) 

E,(9) ‘ 'xmi,4 

e 

* Want weight->C as d->0, and weight->0 as d->» 

* Gaussian an example of kernel function: K(d) = e~ d2 

* Kernel width k to scale distances: K{d) = e ^ 

* Can also incorporate nearest neighbors 
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Parameter Selection 


* What configuration to use for prediction? 

o Number of nearest neighbors 
o Feature weights 
o Kernel width 

* Search techniques to find the best 

o Genetic algorithms 
o Simulated annealing 
o Hill climbing 

o Evaluate configuration using trace data 

* Currently, genetic algorithms show best performance 


APART-2001 


Execution Prediction Performance 
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* Use IBL techniques on experience base of 2000 entries 

i » Predict application runtime & compare against user estimate 

* Genetic algorithm search for configuration over a month’s data from 
steger 

* Evaluate using 6 months of data 

* Average error of prediction technique 4.6X less than user estimate 


Hopper 


IBL Prediction 


Mean Error 
(mins) 


% of Mean 
Runtime 


User Estimate 


Mean Error 
(mins) 


% of Mean 
Runtime 


Mean 

Runtime 

(mins) 
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Queue Wait Time Predictions 
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* Predict how long an application will wait in a scheduling queue 
before starting execution 

Perform a scheduling simulation 

o Simulate scheduling of all waiting and running applications 
o Use execution time predictions in simulation 
o Developed event-driven simulator 
o Implemented a NAS PBS simulator 

Validated NAS PBS simulator 

o For 6 months of data, 64% matched actual start times of ~20K jobs 
o Some mismatches due to dedicated time and machine crashes 

No systematic analysis of prediction accuracy yet 
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User-Level Scheduling 
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* Each user has their own grid scheduler 

o No bottleneck or single point of failure 

* Many potential goals for user-level schedulers 

o Minimize turnaround time of individual applications, parameter study, DAG 
of applications 
o Minimize cost 

* Minimize turnaround time of individual applications 

o User or scheduler identifies potential resources 

□ Cannot consider all grid resources for every application 
o Scheduler selects from potential set of resources using minimum predicted 
turnaround time 

o Scheduler sends application to selected resource 
o Scheduler monitors application progress and periodically checks if 
application should be moved to different resources 
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Implementation at NAS 
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* Predict for three SGI Origins from NAS workstations 

* Command line programs for predictions of execution times, start 
times, and completion times when given PBS script or PBS job ID 

* Command line program to suggest which Origin to use 

* Experience base for each Origin 

* Use NAS Parallel Benchmarks to compute scaling factors between 
machines 

* Predict for machine using it’s experience base, or a scaled 
prediction from other experience bases, depending on confidence 

* Cache execution predictions to improve response time 
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Execution Prediction Implementation 
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* Predict for Steger, Hopper, and Lomax from any machine in cluster 

* Separate experience base for each machine 

* Use NPBs to compute scaling factors between machines 

* Cache execution predictions to improve response time 
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